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1.  Introduction 


Let  Pj,...,Pk  be  k  normal  populations  N(0 ,,crf ),..., N(0k,crf)  /respectively,  where  the 
means  0,,...,0k  eiR  are  unknown,  and  where  the  variances  erf  ,...,crf  >0  are  known.  Suppose 
one  wants  to  select,  based  on  independent  random  samples  of  respective  sizes  n,,...,nk ,  a  non¬ 
empty  subset  of  these  populations  which  contains  the  population  with  the  largest  0  -value,  i.e.  the 
best  population,  with  a  high  probability,  where  the  size  of  the  subset  should  be  as  small  as 
possible.  Apparently  both  goals  work  against  each  other,  similar  as  in  testing  hypotheses  where 
the  goal  is  to  keep  both  types  of  error  probability  small.  As  in  the  latter,  a  natural  approach  is  to 
control  one  of  the  two  goals  and  then  to  optimize  the  other  one. 

The  classical  approach,  due  to  Gupta  (1956,65),  is  to  consider  subset  selection  procedures 
S  for  which  the  minimum  probability  of  a  correct  selection  (CS)  is  at  least  P* ,  where  P*  is 
predetermined.  To  exclude  the  possibility  of  simply  selecting  just  one  of  the  populations  at 
random,  P*  is  assumed  to  be  greater  than  k_1 .  Among  all  subset  selection  procedure  S  which 
satisfy  this  P*  -condition,  an  optimum  procedure  would  have  the  smallest  expected  size  E0(|S|) 

for  all  0  €  .  Unfortunately,  no  such  optimum  procedure  exists  (Deely  and  Johnson,  1997). 

Therefore,  attempts  have  been  made  in  the  past  to  find  suitable  candidates  which  perform  well  in 
terms  of  their  expected  subset  sizes  in  comparison  to  the  other  procedures. 

By  sufficiency,  only  subset  selection  procedures  have  to  be  considered  which  depend  on 
the  k  sample  means  X  =  (X,,...,Xk)  with  realizations  x  =  (x,,...,xk)  s5Rk .  The  underlying 
model  simplifies  to  X;  ~  N(0j,pf‘) ,  where  p;  =  n;  /  erf  denotes  the  precision  of  the  sample  mean 
from  population  P;,  i  =  l,...,k,  and  X,,...,Xk  are  independent.  For  notational  convenience,  let 


2 


in  the  sequel  &tl]  <  d[2]  <  ...<  &[k]  denote  the  ordered  values  of  any  9  <=iRk.  Further,  let  <p  and 
O  denote  the  density  and  the  c.d.f.,  respectively,  of  the  standard  normal  distribution  N(0,1) . 

For  the  case  of  <r2  =...  =  ctk  =  a2  > 0  and  nj  =...=  nk  =  n0 ,  Gupta  (1956,65)  has  proposed 
the  following,  now  classical,  subset  selection  procedure. 

SGupta(2D  =  {i|xi>X(k]-dc7/V^,i  =  l,...,kJ,  where  jok-1(z  +  d)<p(z)dz=P\  (1) 

Modifications  of  this  procedure  to  cases  where  not  both,  the  variances  and  the  sample  sizes,  are 
common,  have  been  proposed  by  Chen  and  Dudewicz  (1973),  Gupta  and  Huang,  W.T.  (1974), 
Gupta  and  Huang,  D.Y.  (1976),  and  Gupta  and  Wong  (1976).  They  are  all  of  the  common  form 

SG(X)  =  {i|Xj  >Xj-Cij  j^i,  i  =  l,...,kj,  where  the  c^’s  are  positive,  and  are  chosen  to  meet 

the  P* -condition  in  their  given  settings.  Berger  and  Gupta  (1980)  have  shown  that  SGupta  and 
some,  but  not  all,  of  its  modifications  of  the  type  SG  to  other  settings  are  minimax,  in  terms  of 

the  expected  subset  size,  in  the  class  of  subset  selection  procedures  which  satisfy  the  P*- 
condition,  and  are  non-randomized,  just,  and  translation  invariant.  Gupta  and  Miescke  (1981) 
have  shown  that  for  k  =  3 ,  SG  is  optimum,  in  terms  of  the  expected  sample  size,  in  the  class  of 

subset  selection  procedures  proposed  by  Seal  (1955,57),  when  the  distances  between  0, . 0k 

are  larger  than  certain  positive  constants.  Similar  results  for  a  particular  slippage  configuration  of 

the  parameters  had  been  derived  previously  by  Deely  and  Gupta  (1968).  Procedures  of  this  type 

have  been  the  objects  of  numerous  investigations  in  the  literature  in  the  past.  An  overview  and 

further  references  can  be  found  in  the  monographs  by  Gupta  and  Panchapakesan  (1979)  and  by 

Bechhofer,  Santner,  and  Goldsman  (1995),  and  recent  results  in  Miescke  and  Rasch  (1996). 

The  purpose  of  the  present  study  is  to  look  at  the  above  mentioned  subset  selection 
procedures  from  a  decision  theoretic  point  of  view,  by  means  of  comparisons  with  Bayes 
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selection  procedures  under  two  natural  type  of  loss  function,  one  of  which  being  closely  related 
to  the  problem  of  minimizing  the  expected  sample  size  subject  to  the  P*  -condition. 

2.  Losses  and  Frequentist  Risks 

Let  L(0,s)  be  a  given  loss  function  for  selecting  the  populations  P;  with  iss  at 
0  =  (0,,...,0k)  e5Rk.  A  non-randomized  subset  selection  procedure  S  is  a  measurable  function 

from  the  sampling  space  5Rk  into  the  set  js|  0  *  sc{l,...,k}j ,  where  the  elements  of  S  are  the 
indices  of  the  populations  that  are  selected.  More  generally,  one  could  consider  also  randomized 
subset  selection  procedures  S',  where  at  every  X  =  x,  population  P;  is  selected  by  S*  with 
probability  ps. . (x) ,  i  =  However,  since  this  does  not  lead  to  any  further  improvement, 

such  considerations  will  be  postponed  until  the  end. 

Consider  now  a  natural  loss  function,  with  some  variations,  that  supports  the  goals  given 

in  Section  1.  One  possible  conflict,  however,  should  be  discussed  first.  It  is  not  immediately  clear 

if  situations  where  populations  are  tied  for  the  largest  parameter  are  statistically  relevant  or  not. 

One  could  argue  that  nothing  in  real  life  repeats  itself  in  an  identical  manner.  On  the  other  hand, 

two  of  the  populations  could  indeed  be  associated  with  the  same  experimental  setup.  For  the  sake 

of  completeness,  suppose  that  the  occurrence  of  such  ties  is  statistically  relevant.  Then  several 

variations  of  the  natural  loss  function  have  to  be  distinguished. 

L,(9,s)=Z[  a-I  .(6,)  ],  (2) 

is  ly[k]/ 

L2(0>S)=Z[  a_I{T(0)}(i)  I*  ^  <il0i  =  0[kl>’ 

L3(9,s)  =2  a-  La  x(max{0;}) ,  0  siRk,  0*sc  {l,...,k},  0<a<l. 

is  lW[k]/  ,es 

The  first  version,  L, ,  rewards  inclusion  of  all  populations  which  have  the  largest  parameter,  the 
second,  L2 ,  rewards  inclusion  of  exactly  one  population  that  has  been  tagged  as  the  best  by  a 
given  function  t,  and  the  third,  L3,  rewards  the  inclusion  of  only  one,  but  any  one,  population 
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which  has  the  largest  parameter.  The  choice  of  a=0(a>l)  has  been  omitted  since  it  supports 
the  selection  of  all  k  (only  1  of  the)  populations.  A  positive  factor  b  could  have  been  attached  to 
the  indicator  functions  in  (2),  with  the  restriction  of  0  <  a  <  b ,  but  without  gain  of  generality. 

A  modified  version  of  L, ,  where  the  gain  of  including  a  best  population  depends  on  the 
number  of  populations  tied  for  the  best,  has  been  used  in  Gupta  and  Hsu  (1977).  L2has  been 
used  only  implicitly  in  the  literature,  as  it  can  be  understood  from  the  respective  contexts.  L3has 
been  used  in  Bratcher  and  Bhalla  (1974)  and  in  simulation  studies  of  Gupta  and  Hsu  (1978).  In 
the  sequel,  situations  where  populations  are  tied  for  the  best  will  have  probability  zero,  and  then 
distinctions  between  L3 ,  L2 ,  and  L3  will  be  irrelevant.  Lj  proves  then  to  be  the  most  convenient 

version,  and  will  thus  be  used.  Let  it  be  called  henceforth  ”(0,a,a-l)  loss”,  and  denoted  by  L. 

The  frequentist  risk  of  a  subset  selection  procedure  S  under  loss  L  is  given  by 

R(9,S)  =  E,(L(6,S(39))=i;[a-Ile>,l(8i)]p2{is  S(29}  (3) 

i=l  *• 

=  aE9(|s(X) | )-  2>e{ie  S(X)}, 

i€A(0) 

where  A(0)  =  { j  1 0j  =  9[k]} ,  6  e  $Rk .  On  Q  =  {  0  |  0^.,,  <  0[k] ,  0  e  5Rk} ,  this  simplifies  to 

R(e,S)  =  aE,(|s(29|)-P,{i-(9)€  SQO},  where  8,.(S)  =  8m  ,  9  sfl.  (4) 

The  risk  function  of  a  subset  selection  rule  S  is  not  continuous  on  91 k,  since  in  (3)  the 
size  of  A(0)  can  drop  down  to  1  in  every  neighborhood  of  a  point  0  e91k  with  0[k_,j  =  9[k]. 

Obviously,  for  every  subset  selection  procedures  S,  its  minimum  probability  of  a  correct 
selection  (CS)  must  occur  on  Q .  This  fact  will  be  used  later  on  in  Section  3. 

Some  interesting  features  of  this  risk  will  now  be  discussed.  Consider  the  following 
scenarios,  which  may  hold  for  any  two  given  selection  rules  S,  and  S2 . 
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(a) 

(b) 

(c) 


(5) 


R(0,S,)<R(0,S2),  0  eQ, 

Pg {cs  under  S, (X)}  >  Pfi {CS  under  S2 (X)} ,  0  eQ , 

Eg(|S]QQ|)<Eg(|S2(2D|),  QeQ. 

Also,  let  (b’)  be  (b)  with  “>”  replaced  by  “<  and  (c’)  be  (c)  with  “<  “  replaced  by  Then 
(b)  and  (c)  together  imply  (a),  (a)  and  (b’)  together  imply  (c),  and  (a)  and  (c’)  together  imply  (b). 
Moreover,  for  a  subset  selection  rale  S,  which  is  admissible,  there  cannot  be  a  subset  selection 
rale  S2  which  satisfies  both,  (b’)  and  (c’)»  with  at  least  one  strict  inequality  in  (b’)  or  (c’).  In 
other  words,  one  can  state  the  following. 

Lemma  1.  Let  Q  be  the  parameter  space,  and  S  be  a  subset  selection  rule  which  is  admissible 
in  the  class  of  non-randomized  subset  selection  rules  under  loss  L .  Then  there  does  not  exist  any 
subset  selection  rule  which  is  as  good  as  S  in  terms  of  both,  probability  of  correct  selection  and 
expected  subset  size,  and  better  in  at  least  one  of  the  two  at  some  point  in  Q . 

Since  there  does  not  exist  a  subset  selection  rule  which  minimizes  the  expected  subset 
size,  uniformly  in  0  e  Q ,  subject  to  the  P*  -condition,  it  is  reasonable  to  consider  subset  selection 

rales  which  are  admissible  and  satisfy  the  P’  -condition.  As  it  will  be  seen  later,  such  rales  can 
be  found  in  the  class  of  Bayes  subset  selection  rales. 

Other  loss  functions  for  subset  selection  procedures  which  have  been  used  in  the 
literature,  depend  more  smoothly  on  the  distances  of  the  selected  parameters  to  the  largest 
parameter,  and  are  discussed  in  Miescke  (1997).  A  natural  loss  function  in  this  class  is  the  so- 
called  "linear  loss  ”,  which  has  been  used  in  Miescke  (1979),  and  is  given  by 

L*(9,s)  =X[  e[k|  —  ©i  —  A  ],  0eiRk,  0*sc  {l,...,k},  0<A.  (6) 

ies 

Although  not  a  linear  function  of  0,  it  increases  linearly  when  non-largest  parameters  of 
populations  in  the  selected  subset  are  moving  away  from  the  largest,  hence  its  name.  More 
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specifically,  under  this  loss  function,  selection  of  a  population  with  its  parameter  less  (more)  than 
A  away  from  the  largest  parameter  is  rewarded  (penalized)  on  a  linear  scale. 

The  frequentist  risk  of  a  subset  selection  procedure  S  under  loss  L*  is  given  by 

R-(0,S)  =  e,(l’(0,S(2Q))  =  t[e|k| -8, -a]  p,{i  es<29}  P> 

=  [e|k|+y-A]E,(S|X|)-i;  [e,+y]P,{isS(X)},  ?s«k, 
where  y  is  chosen  to  have  0[k,  +  y  -  A>  0.  Situations  where  populations  are  tied  for  the  best  do 

not  cause  the  discontinuity  conflicts  that  were  observed  under  the  (0,a,a-l)  loss  .  Implications, 
analogously  to  those  below  of  (5),  hold  for  the  terms  in  the  lower  part  of  (7),  by  simply  replacing 

Pe  {CS  under  S(2Q}  in  (5)  by  £*,[ 0*  +y]  P9{i  eS(X)} .  Clearly,  the  latter  is  not  an  expectation. 
It  should  be  treated  as  an  inner  product,  where  its  Schur-type  monotonicity  is  relevant  for 
performance  considerations. 

3.  Bayes  Risks  and  new  Procedures 

Let  X,,...,Xk  be  the  sample  means,  based  on  independent  samples  of  sizes  n,,...,nk, 
from  k  normal  populations  P„...,Pk  ,  respectively.  It  is  assumed  that  X;  ~  N(0i,p:1) ,  where 
p.  =  n.  /  denotes  the  precision  of  the  sample  mean  X; ,  and  that  erf  >  0  is  known,  i  =  1,... ,  k . 
The  performance  of  a  selection  rule  S  is  measured  by  its  expected  loss,  i.e.  by  its  frequentist  risk 
R(0,S)  =  Ee  (L(0,S(X)) ,  at  parameter  configurations  0  e  <Rk . 

For  admissibility  considerations  of  subset  selection  rules  S,  the  present  framework  will 
now  be  extended  to  the  Bayesian  approach.  Starting  with  the  pioneering  work  by  Dunnett  (1960), 
Bayesian  selection  procedures  have  been  studied  at  numerous  occasions  in  the  literature.  An 
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overview  of  these  results  can  be  found  in  Miescke  (1997).  From  now  on,  it  is  assumed  that  the 
parameters  0  =  (©!,...,©,,),  say,  are  a  priori  random  variables  which  follow  a  known  prior 

density  rc(0) ,  0  e  9lk .  More  specifically,  it  will  be  assumed  below  that  a  priori,  ©;  ~  N(pi,  vr1) , 
i  =  l,...,k ,  where  the  prior  mean  p;  e  91  and  the  precision  Vj  >  0 ,  i  =  l,...,k ,  are  known,  and 
where  0lr..,0K  are  independent.  A  posteriori,  given  X  =  x  ,  ©;  ~N(pi(xi),'cj'1 ),  with 
Pi(Xi)=  (pjXj  +  vjpi)/(pi  +Vj)  and  'ci=pi+vi  ,  i  =  l,...,k,  and  ©,,... ,©t  are  independent. 
Moreover,  marginally,  X(  ~  N(pj , pj-1  +  vj-1) ,  i  =  l,...,k ,  and  are  independent. 

Let  Q  =  { 0  1 0^.1]  <  9[k]  >  9  e  91k}  be  that  part  of  the  parameter  space  where  the  largest  of 
the  k  parameters  is  unique.  Since  P{©  eQ}  =  1,  the  parameter  space  can  be  replaced  by  Q, 
which  thus  will  be  done  now. 

First,  the  “(0,a,a-l)  loss”  will  be  considered.  Since  all  three  loss  functions  in  (2)  are 
identical  on  Q ,  for  convenience,  L(0,s)  =  L,(0,s) ,  0  sQ,  will  be  used  in  the  sequel.  It  should 
be  noted  that  a  - 1  <  L(0,s)  <  (k  -  l)a  holds,  and  that  the  two  bounds  can  be  attained.  On  Q ,  the 
frequentist  risk  R(0,S)  =  Ee(L(0,  S(X))  of  a  subset  selection  rule  S  under  loss  L  is  given  by  (4), 

and  it  inherits  the  two  bounds  from  L.  Likewise,  the  lower  and  the  upper  bound  can  be  attained, 
as  it  can  be  seen  from  (4)  by  using  a  no-data  rule  that  selects  1  or  k-1,  respectively,  fixed 
populations. 

The  Bayes  rules  S*  for  the  normal  prior  density  7r(0) ,  0  e  91k ,  under  loss  function  L, 
minimize  the  posterior  risk,  i.e.  the  posterior  expected  loss,  at  every  X  =  x  e  91k ,  and  are 


S*(x) 


ji  pj©;  =©[k]|x  =  x}  ^a,  i  =  l,...,k| ,  if  this  set  is  not  empty, 

{i0}  for  any  i0  with  p|©io  =  ©[k]  |  X  =  x}  =  max  p|@j  =  0[k]  |  X  =  x],  otherwise, 
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with  the  option  to  drop  elements  in  the  first  set  for  which  equality  occurs.  This  follows  from  the 
fact  that  the  Bayes  risk,  which  is  the  expected  minimum  of  the  2k  - 1  posterior  risks  that  are 
associated  with  all  non-empty  subset  selections  s(X) ,  is  equal  to 


r(it,S‘)  = 


(9) 


The  Bayes  risk  of  any  subset  selection  procedure  S,  including  S* ,  can  be  represented  by 


r(7t,S)  = 


(10) 


The  lower  bound  a  - 1  and  the  upper  bound  (k  -  l)a  on  the  frequentist  risk  are  inherited 
by  the  Bayes  risk,  but  may  not  be  attained  by  the  latter. 

A  classical  approach  to  verify  admissibility  is  the  method  by  Blyth  (1951).  Similar  as  in 
Miescke  and  Park  (1997),  it  will  be  shown  that  all  assumptions,  including  (a),  (b),  and  (c),  of 
Theorem  13  in  Berger  (1985,  p.  547),  which  provides  admissibility,  can  be  met.  Although  Q  is 
not  convex,  it  differs  from  91 k  only  by  a  Lebesgue  null  set,  and  thus  is  sufficient  for  the  proof  of 
that  theorem  to  remain  valid. 

In  the  present  setting,  the  frequentist  risk  of  every  subset  selection  procedure  S  is  a 
continuous  function  of  0  e  Q .  This  follows  from  the  fact  that  under  loss  L  ,  the  risk  of  S  at  any 


0  eQ,  with  0  =  0[k] ,  which  is  given  by  (4),  can  be  written  as 

R(9,S)  =  a2>0{i  ^S(X)}  -  P9{i*(0)  eSQD},  (11) 

i=I 

where  P9{r  eS(X)}  is  continuous  on  Q,  r  =  l,...,k,  since  the  underlying  distribution  is  a 
regular  k -parameter  exponential  family.  It  follows  that  all  Bayes  rules  for  the  prior  density  n(9) 
are  admissible  on  Q.  under  loss  L  in  the  class  of  non-randomized  subset  selection  rules. 
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Moreover,  a  subset  selection  rule  S  which  is  admissible  on  Q  must  be  admissible  on  $K* ,  which 
can  be  seen  easily  from  (3),  and  the  fact  that  Pg{r  eS(X)}  is  also  continuous  on  ,  r  =  . 

The  set  N  =  {x  |  x[k.(]  =  x[k]}  is  a  Lebesgue  null  set  in  91* ,  which  has  no  effect  on  the 
frequentist  and  Bayes  risks  considered  below.  Therefore,  the  sample  space  can  also  be  replaced 
by  Q ,  as  long  as  Bayes  risks  for  prior  density  7t(0)  on  SR*  are  under  concern.  This  proves  to  be 
convenient  and  will  thus  be  done  in  the  sequel.  For  n  =  1,2,...,  let  itn  be  the  prior  density  rt(0) 
with  the  choices  of  p.;  =0  and  Vj~l  =  n/p;,  i  =  l,...,k .  Furthermore,  let  tc*  be  the  improper 
prior  density  given  by  7r*(0)  =  nk/27tn(0) ,  i.e. 

<(6)  =  (27r)-k/2  flexp(-(2n)-102 ) ,  0  eKk  .  (12) 

i-1 

(a)  The  Bayes  risks  r(7t‘,S)  of  any  subset  selection  procedure  S,  and  thus  also  r(7t*,S’t")  of 
S’1" ,  which  is  the  Bayes  procedure  with  respect  to  7tn  (as  well  as  the  generalized  Bayes  procedure 
with  respect  to  7in’ ),  are  finite  for  all  n  =  1,2,...,  since  the  loss  function  L,  given  by  Lt  in  (2),  is 
bounded  by  a  - 1  <  L(0,  s)  <  k(a  - 1) . 

(b)  For  any  convex  set  K  c  9lk  that  is  non-degenerate,  i.e.  that  has  a  positive  Lebesgue 
measure,  there  exists  a  Q  >  0  and  an  integer  M  such  that  for  n  >  M , 

/*;(§)  dfliQ.  (13) 

K 

This  follows  from  the  fact  that  7tj  <  rt*  for  every  integer  n  >  1 . 

(c)  To  prove  that  a  subset  selection  procedure  S  is  admissible  on  Q  under  loss  L  in  the  class 
of  all  non-randomized  subset  selection  rules,  it  is  sufficient  to  show  that 
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Jim  [r(<,S)-r(<,S*’)]  =  Jim  nk/2[r(7rn,S)-r(7tn,S,t”)]  =  0.  (14) 

The  Bayes  risk  of  the  Bayes  subset  selection  procedure  S*"  for  prior  7t„ ,  n  =  1,2, . . . ,  will 
now  be  examined  in  details.  After  rewriting  (9)  in  terms  of  integrals,  one  has  for  n  =  1,2,..., 


r(irn,S*°)  =  J  min  £  a~  JTl0(  PjI/2[[n/(n  +  1)]1/2(Xi -Xj)  +  Pi'1/2  z]  )<p(z)dz  (15) 

a  Mi)  »i*  J 

X  (lip,  /  (n ■ + 1)]" q>([p,  /  (n  + 1)]"2 x, )  dx„ 

r=l 

The  minimum  in  (15)  is  taken  pointwise,  at  every  x  <=Q,  over  the  2k  - 1  non-empty  subsets  of 
{l,...,k} .  A  change  of  variables  xr  =  Vn+lzr,  r  =  l,...,k ,  leads  to  the  alternative  representation 

r(7in,S’t”)  =  J  min  a-  Jn^pM^CZi -Zj)+  Pi'1/2z])cp(z)dz  (16) 

n  S(-V0  ies(z)  SR  j*i 

XriPrI/2(P(Pr1/2Zr)dZr- 

r=l 

The  Bayes  risk  of  any  subset  selection  procedure  S  can  be  seen  in  the  same  way  to  have, 
analogously  to  (15)  and  (16),  the  two  representations  for  n  =  1,2,... 


r(jtn,S)  =  J  £  a-  JII®(  Pjl/2[[n/(n+l)]1/2(xi  -Xj)  +  Pi-1/2  z]  )<p(z)dz 


n  s(2^0  ies(z)  L  SR  jVi 


£1  ieS(x)  I  SR 


flip,  /  (n  + 1)]"  <p([p,  /  (n  +  l)fJ  x,)  dxr 


=  {  Y,  a-/n«(pj''![n1',(zl-zJ)  +  p,l'!z])(p(z)dz  flp/Vp  Z"2,)112,. 


Q  ieS((n+l)1/zz)  L  «  J*1 


As  in  (1 5)  and  (16),  x  plays  the  role  of  the  k  sample  means  X  =  xeQ,  and  z  =  (n  + 1)  1/2  x  . 

An  interesting  subset  selection  rule  is  the  generalized  Bayes  rule  SGen  for  the  Lebesgue 


measure  used  as  non-informative  prior  on  5Rk .  It  selects,  at  x  €  91 k ,  any  subset  which  achieves 
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r(x):=  min  £ 

s<2>*0iss(z) 


a“  Jno(p//2[Xi-xj+  Pi"‘/2z] ) (p(z)dz 


<R  J*t 


(18) 


Lemma  2.  For  every  subset  selection  procedure  S,  and  for  every  n  =  1,2, . . . , 

0<r(7tn,S)-(a-l)  (19) 


fit  ieS((n+I)  z) 


J  Z  1_  Pj1/2[nI/2(Zi  -zj)  +  pi‘1/2  z]  )<p(z)dz 


5?  J*l 


npr1/29(pr'/2zr)dZr- 

r3! 


Proof:  It  has  been  shown  already  that  r(7in  ,S)  >  a  - 1  holds.  As  to  the  second  inequality,  there 
is  at  least  one  term  in  the  (first  and)  second  sum  of  (17).  Thus,  the  sum  plus  (1  -  a)  is  less  or 
equal  than  the  sum  of  each  term  plus  (1  -  a) . 

The  limiting  behavior  of  the  Bayes  risks  of  the  subset  selection  rules  considered  above  is 
as  follows. 

Theorem  1.  limr(7in,SGen)  =  limr(7in,SG)  =  limr(7tn,S"")  =  a-1. 

n-^co  n—*oo  n— »co 


Proof:  Let  zsQ  be  fixed.  The  inner  integral  of  (19)  converges,  as  n  tends  to  infinity,  to  1  if 
i  =  i*(z)  with  z..  =  Z[kj ,  and  it  converges  to  0  otherwise.  For  SGen ,  the  minimum  in  (18),  by 

using  r(x)  =  r((n  +  l)l/2z) ,  occurs,  for  sufficiently  large  n,  at  the  subset  s*(z)  =  (i*(z)} .  Thus,  by 

Lebesgue’s  bounded  convergence  theorem,  one  can  verify  that  for  SGen  the  Bayes  risks  in  (19) 
tends  to  a-1  as  n  tends  to  infinity.  Similarly,  one  can  verify  the  limit  for  any  subset  selection 
rule  of  the  type  SG,  as  defined  below  of  (1),  using  SG(x)  =  SG((n  +  l)l/2z).  The  third  limit 
follows  from  the  fact  that  a-1  <  r(nn,SIt")  <  r(-n:n,SGen)  holds  for  all  n. 

The  limiting  behavior  of  nlc/2[r(7tn,S,'”)-(a_l)]>  as  n  tends  to  infinity,  remains 
unresolved.  In  view  of  (16),  at  z  eQ ,  and  for  i*  (z)  given  by  z..  =  z[k) , 
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]™n“Jwi'G5>)  i-m  ®(pi” 


9*  j*i*(z) 


nl/2(Zi*(z)-Zj)+PiV 


-1/2  r 


"(?)  ri-(z) 

k 


)  9(z)dz 


(20) 


xnpr,/2<P(Pr'/2Zr)dZr  =  00  • 


r=l 


This  can  be  seen  by  changing  variables  in  (20)  with  vr  =  nk/2zr,  r  =  l,...,k ,  and  then  considering 
an  area  of  v  e  Cl  where  the  differences  of  the  coordinates  are  smaller  than  a  suitable  bound.  On 
the  other  hand,  for  i  *  i*(z) ,  the  limit  of  the  expression  in  (20),  with  i*(z)  replaced  by  i,  and 
with  1  replaced  by  a  ,  is  only  known  to  be  less  or  equal  to  zero.  This  difficulty  makes  in 
unfeasible  to  determine  if  (14)  does  or  does  not  hold  for  SGen  or  SG . 

For  every  n=l,2,...,  the  Bayes  rule  S*“  for  prior  nn  is  admissible  and  thus  has  the 
optimum  properties  in  term  of  the  probability  of  a  correct  selection  and  the  expected  subset  size 
given  by  Lemma  1  on  9tk.  From  a  practical  point  of  view,  the  difference  between  SGenand 
S“"  becomes  negligible  for  large  n,  and  therefore  SGen  appears  to  be  suitable  for  practical  use. 

To  establish  this  new  subset  selection  procedure  SGen  at  the  P*  -condition,  one  has  to 
compromise  on  the  value  of  a  in  the  loss  function  and  use  that  value  of  a  for  which  P*  is  equal 
to  the  minimum  of  the  probability  of  a  correct  selection.  To  determine  the  latter,  let 

q;(x)=  Jn°(Pj1/2[xi  _xj+  Pi'1/2z])<P(z)dz»  i  =  l,...,k,  xe^k.  (21) 

91  j*i 


From  (18)  it  follows  that  at  every  x  s  91 k , 


SGen  (x)  = 


|  i|  q((x)  >  a,  i  =  l,...,k  J  ,  if  this  set  is  not  empty, 
{i0}  ,  for  any  i0  with  qio  (x)  =  max  qj(x)>  otherwise, 


(22) 
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with  the  option  to  drop  elements  in  the  first  set  for  which  equality  occurs.  Using  the 
representation  Xr  =0r  +  p~1/2Nr,  r  =  l,...,k,  where  N,,...,Nk  are  generic  i.i.d.  standard  normal 
random  variables,  the  probability  of  a  correct  selection  of  SGen  at  0  <=Q  with  0,.  =  0r. .  is 

1  (0)  lKJ 


p  J  n  < Pj’/2[e[k] -9j  +P -Ni,§) -p:,/2Nj  +P T-zl) <p(z)dz  >  a,  or  (23) 

I  n  <l(pJ,'I[6m-91+p:"N,(9)-pT”NJ+pi:“z])¥(Z)dz 

a  |  n«,(Pj,'!K-0i+P;l'JN,-p;”2Nj+Pr-|,!z])q>(z)dz.  for  all  r*f(9)  } 

R  j*T  J 

It  has  been  shown  in  Gupta  and  Miescke  (1988)  that  for  x,  =  x2  =...=  xk,  the  maximum 
of  the  k  values  in  (21)  occurs  at  {i|  p£  =  pm,i  =  l,...,k} .  Thus,  if  x,,...,xk  are  sufficiently  close 

together,  the  maximum  of  the  k  values  in  (21)  may  not  occur  at  i=i*(x).  However,  for 
p,  =  p2  =...=  pk,  that  maximum  always  occurs  at  i  =  i*  (x) ,  and  here  SGen  can  be  established  at 
the  P*  -condition  with  a  value  of  a  that  depends  only  on  k  and  P* . 

Theorem  2.  For  Pj  =  p2  =...=  pk  =  p ,  say,  SGen  satisfies  the  P*  -condition  iff  a.  satisfies 

Pi  Nk  =  N[k] ,  or  Jn^Nic  _Nj +  z)(p(z)dz  >  a  i  =  P*.  (24) 

i  <R  j=l  J 

Proof:  From  (21)  -  (23),  one  can  see  that  for  p,  =  p2  =...=  pk  =  p ,  the  probability  of  a  correct 
selection  under  SGenat  0  eQ  with  0k  =©(k]  is 

P{  0(k]  +  p-I/2Nk  >  0 j  +  p"1/2Nj  ,  for  all  j  <  k ,  (25) 

or  lno(p1/2[e[k]-ej]+Nk-Nj  +  z)(P(z)dz  >a|, 
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which  is  decreasing  in  0[k]and  increasing  in  .  The  minimum  of  (25)  thus  occurs  at 

0j  =  02  =  ...=  0k ,  where  it  is  equal  to  (24). 

For  any  predetermined  P* ,  the  value  of  a  for  which  (24)  holds  has  to  be  determined  on  a 
computer  with  numerical  integration  or  simulation.  Likewise,  comparisons  of  the  expected  subset 
sizes  of  SGen  and  another  subset  selection  procedure,  such  as  SG ,  both  meeting  the  P*  -condition, 
has  to  be  done  in  this  way. 

In  the  second  part  of  this  section,  considerations  similar  to  those  above  will  now  be  made 
for  the  “linear  loss”  (6),  but  in  a  more  concise  manner.  The  loss  function  L*  has  -  A  as  a  lower 
bound,  which  can  be  achieved.  This  bound  is  inherited  by  the  firequentist  risk  and  the  Bayes  risk, 
and  can  be  achieved  by  the  former. 

The  Bayes  subset  selection  rules  S*  for  the  normal  prior  density  7t(0) ,  0  e  91 k ,  under  loss 
function  L* ,  minimizes  the  posterior  risk  at  every  X  =  x  e91k,  and  are 


S!(x)  = 


|  i|  e|©;|x  =  x}  >  e[©[1c,  |x  =  x}  -  A,  i  =  l,...,k  j  ,  if  this  set  is  not  empty, 
{i0}  for  any  i0  with  E|@io|x  =  x}  =  maxE|©j|x  =  x},  otherwise, 


with  the  option  to  drop  elements  in  the  first  set  from  which  equality  occurs.  This  follows  from 
the  fact  that  the  Bayes  risk  of  the  Bayes  subset  selection  rules  S*  is  given  by 

(27) 


r 

\ 

•  /  .  \  /  -  \  1 

min  2 

Vs<*>*0  i£x> 

e{©w|x)-e{©,|x)-a]J 

r.(7t,S:)  =  E| 

and  the  Bayes  risk  of  any  subset  selection  procedure  S  is  given  by 


r.(7t,S)  =  E 


S[E{©,l||x}-E{ei|x}-A 


\  ieS(X) 


(28) 
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The  Bayes  risk  of  the  Bayes  subset  selection  procedure  S"n  under  loss  L°  for  prior  7tn , 
n  =  1,2, . . . ,  which  are  given  above  of  (12),  is 

r.0rn,S:‘)  (29) 

E^maxj[n/ (n  +  l)]Xj  +[n/(n  +  l)]'/2pJ"l/2Njjj  -[n/(n  +  l)]Xj  -  A 

X  nip,  /  (n  +  1)1”  *p([p r  /  (n  +  l)]w  xr )  dx, , 

r=l 

where  N [,..., Nk  are  generic  i.i.d.  standard  normal  random  variables.  The  minimum  in  (29)  is 
taken  pointwise,  at  every  xeQ,  over  all  2k  -  1  non-empty  subsets  of  {l,...,k} .  A  change  of 
variables  xr  =  Vn+T  zr  r  =  1, . . . , k ,  leads  to  the  alternative  representation 


-J 


min  J] 


The  Bayes  risk  r.  (7tn,S)  of  any  other  subset  selection  procedure  S  can  be  derived  from  (29)  by 
deleting  the  minimum  and  replacing  s(x)  by  S(x) .  Likewise,  it  can  be  derived  from  (30)  by 
deleting  the  minimum  and  replacing  s(z)  by  S((n  +  l)1/2z) . 

An  interesting  subset  selection  rule  is  the  generalized  Bayes  rule  S°*”  for  the  Lebesgue 
measure  used  as  a  non-informative  prior  on  .  It  selects,  at  x  e  5Rk ,  any  subset  which  achieves 


•„(20:=  min  X  E  maxfxj  +pj"1/2Nj}  j-Xj-A 
•v-^  [  V  j=i . kl  J  rj 


(31) 


The  limiting  behavior  of  the  Bayes  risks  of  the  subset  selection  rules  considered  above  is 
as  follows. 
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Theorem  3.  limr.(7rn,S?en)  =  limr.(7in,SG)  =  limr.(7tn,S"")  =  -A. 

n— >oo  n->oo  n— >oo 

Proof:  Let  xeQ  be  fixed.  Obviously,  i*(x)  is  element  of  SG,  Sfen ,  and  S*" for  all  n.  That 


part  of  the  sum  in  the  respective  Bayes  risks  which  is  associated  with  a  selection  of  i*  (x) ,  in  the 
setting  of  (30)  with  z  =  (n  +  l)~1/2x,  tends  to  -  A .  This  follows  from 


(32) 


and  the  fact  that  the  term  in  the  term  in  the  middle  of  (32)  tends  to  0  as  n  tends  to  infinity. 

Let  now  i*i*(x)-  Since  for  ieSG(x),  x[k]-X;<5,  i.e.  (n  +  l)1/2(z[k] -Z;) <5 ,  for 


where  the  indicator  function  in  (33)  is  0  for  sufficiently  large  n.  Thus  by  Lebesgue’s  bounded 
convergence  theorem,  the  second  limit  in  the  theorem  holds  true.  The  third  follows  from 
-  A  <  r.(rcn,S”")  <  r.(nn,SG) ,  n=  1,2,... 

For  any  i  *  i*(x)  with  i  eS?en(x) ,  that  part  of  the  sum  in  the  Bayes  risk  of  S°en 
which  is  associated  with  a  selection  of  i  is  bounded  by 

-A  [n/(n  +  l)]1/2E(maxJn,/2(zj  -z^  +  pf'X})  -  A  (34) 

*  w,,"* +y(n + 1),/2  ^  ^  > + Kn + !) ;  <  Pi_i/2Ni})  -  a_ 

<  ([(n  + 1)  /  n]l/2  -  ^(maxjp/^Nj})  <  ^maxJpf^Nj}) . 
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The  second  inequality  in  (34)  is  established  by  replacing  a  factor  n  /  (n  + 1)  of  the  expectation  by 
one.  The  third  follows  from  the  fact  that,  according  to  (31), 


El  max  j (n + 1)  (z j  -  Zj )  +  p  T  N  j |  J  <  A  for  i  sSr((n  +  l),/2(z)} 

V j— '  / 


This  completes  the  proof  of  the  theorem. 


Similar  as  with  the  “(0,a,a-l)  loss”,  the  limiting  behavior  of  nk/2[r0(7tll,S^) 


+  A  ,  as  n 


tends  to  infinity,  remains  unresolved.  This  difficulty  makes  in  unfeasible  to  determine  if  (14) 
does  or  does  not  hold  for  Sfen  or  SG . 

From  (7)  it  can  be  seen  that  the  frequentist  risk  of  every  Bayes  subset  selection  procedure 
is  continuous  on  91k,  and  thus  all  Bayes  rules  for  the  prior  densities  7t(0)  and  7tn(0)  are 
admissible  on  91 k  in  the  class  of  non-randomized  subset  selection  rules.  The  arguments  are  the 
same  that  were  used  below  of  (1 1).  From  a  practical  point  of  view,  the  difference  between  S^en 
and  S“”  becomes  negligible  for  large  n,  and  thus  S°en  appears  to  be  suitable  for  practical  use. 

To  establish  this  new  subset  selection  procedure  S?en  at  the  P*  -condition,  one  has  to 

compromise  on  the  value  of  A  in  the  loss  function  L*  and  use  that  value  of  A  for  which  P*  is 
equal  to  the  minimum  of  the  probability  of  a  correct  selection.  To  determine  the  latter,  let 


Pi(x)  =  xi -Efmaxfxj+Pj'^Nj}],  i  =  l,...,k,  x<=5Rk. 


From  (3 1)  it  follows  that  at  every  x  6  91 k , 


sr  (x)  = 


f  |  i|  P;(x)  ^  -A,  i  =  l,...,k  | ,  if  this  set  is  not  empty. 


{i0}  ,  for  any  i0  with  xio  =  x[lc],  otherwise, 
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with  the  option  to  drop  elements  in  the  first  set  for  which  equality  occurs.  The  probability  of  a 
correct  selection  of  S°en  at  9  e  Q  with  9,.(0)  =  9(k]  is 


pZ{9w+PrSZr(B-K^J9i  +  p'''"Zi+pj'l'!Ni))-_A’  or  (38) 

9m  +  pr<BZ-(8> 2  9. +  f"lz-  •  for  a11  r  *  r©}  > 

where  Z,,...,Zk  are  i.i.d.  generic  standard  normal  random  variables  which  are  independent  of 
N Nk,  and  where  the  superscripts  indicate  the  random  variables  involved.  S?en  can  be 
established  at  the  P*  -condition  with  a  value  of  A  that  depends  on  pj  ,...,pk ,  k,  and  P* . 
Theorem  4.  S?en  satisfies  the  P*  -condition  iff  A  satisfies 

min  P-  |pr1/2Zt  =  max{p:1/2Zj.}  ,  or  ffV2Z{  -  EH(maxJp;,/2(Zj  +Nj.)})  >  -A  |  =  P*.  (39) 

Proof:  The  assertion  follows  from  the  fact  that  the  probability  in  (38)  is  increasing  in  9[k]  and 


decreasing  in  9[1],...,9[k_1]. 

It  is  interesting  to  note  that  for  p,  =  p2  =...=  pk  =  p ,  say,  (39)  simplifies  to 

pi{z,-z,k,,  or  Zt -EM(max|Zj +Njj)  S-pl/2A  |  =P*.  (40) 

For  any  predetermined  P* ,  the  value  of  A  for  which  (39)  or  (40)  holds  has  to  be 
determined  on  a  computer  with  numerical  integration  or  simulation.  Likewise,  comparisons  of 
the  expected  subset  sizes  of  S?en  and  another  subset  selection  procedure,  such  as  SG ,  both 

meeting  the  P*  -condition,  has  to  be  done  in  this  way. 

In  conclusion,  the  extension  of  the  admissibility  results  from  non-randomized  to 
randomized  subset  selection  rules  will  be  described  briefly.  A  randomized  subset  selection  rule 
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S*,  say,  can  be  represented  by  S*(x,u)  =  |  i  ju;  <  ps.  .(x),  i  =  ,  where  at  every  xedlk, 

ps.  r(x)  denotes  the  probability  of  including  population  Pr  into  the  subset,  r  =  l,...,k ,  and  where 
U,,...,Ukare  generic  i.i.d.  random  variables,  each  uniformly  distributed  on  [0,1],  which  are 
independent  of  X .  For  such  a  randomized  subset  selection  rule  S*(X,U) ,  one  has 

Pg{r  €S*(X,U)}  =  Pejur  <ps.  ,(X)j  =  Eg(ps.  r(2o),  9e<R‘,r  =  l . k.  (41) 

The  risk  function  of  a  randomized  subset  selection  rule  S*  under  loss  function  L  is  given  by 

R(0,S*)  =  E9(L(0,S*(X,U)))  =  a2Ee(ps.i(X))-  Z  Ee(p  (X)),  eS91k,  (42) 

i=l  jsA(g)  " V  ' 

where  A(0)  =  {j  |© j  =  0[k]} .  By  the  same  arguments  that  have  been  used  before,  for  every 

randomized  subset  selection  rule  S*,  R(0,S")  is  continuous  on  Q .  Every  S*  that  is  admissible  on 
Q  under  L  within  the  class  of  all  randomized  subset  selection  rules  must  also  be  admissible  on 
$R*,  since  for  any  randomized  subset  selection  rule  S\  E9(ps.  r(X))  is  continuous  at  every 

0  s  5Rk  ,  r  =  l,...,k .  For  loss  function  L*  similar  arguments  apply,  but  are  omitted  for  brevity. 
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